Information Theoretic Model Validation for Spectral Clustering
نویسندگان
چکیده
Model validation constitutes a fundamental step in data clustering. The central question is: Which cluster model and how many clusters are most appropriate for a certain application? In this study, we introduce a method for the validation of spectral clustering based upon approximation set coding. In particular, we compare correlation and pairwise clustering to analyze the correlations of temporal gene expression profiles. To evaluate and select clustering models, we calculate their reliable informativeness. Experimental results in the context of gene expression analysis show that pairwise clustering yields superior amounts of reliable information. The analysis results are consistent with the Bayesian Information Criterion (BIC), and exhibit higher generality than BIC.
منابع مشابه
Information theoretic model selection in clustering
Model selection in clustering requires (i) to specify a clustering principle and (ii) to decide an appropriate number of clusters depending on the noise level in the data. We advocate an information theoretic perspective where the uncertainty in the data set induces an uncertainty in the solution space of clusterings. A clustering model, which can tolerate a higher level of noise in the data th...
متن کاملInformation Theoretic Pairwise Clustering
In this paper we develop an information-theoretic approach for pairwise clustering. The Laplacian of the pairwise similarity matrix can be used to define a Markov random walk on the data points. This view forms a probabilistic interpretation of spectral clustering methods. We utilize this probabilistic model to define a novel clustering cost function that is based on maximizing the mutual infor...
متن کاملA Comparative Study of Spectral Clustering and Information-theoretic Co-clustering for Video Shot Categorization
Automatic categorization of video shots is important in video indexing and retrieval. To improve the effectiveness of video shot categorization, current researchers have addressed two major issues: i) spatio-temporal coherence from shot to shot, and ii) bipartite correlation between descriptive features and shot categories. In recent works, spectral clustering and information-theoretic co-clust...
متن کاملNGTSOM: A Novel Data Clustering Algorithm Based on Game Theoretic and Self- Organizing Map
Identifying clusters is an important aspect of data analysis. This paper proposes a noveldata clustering algorithm to increase the clustering accuracy. A novel game theoretic self-organizingmap (NGTSOM ) and neural gas (NG) are used in combination with Competitive Hebbian Learning(CHL) to improve the quality of the map and provide a better vector quantization (VQ) for clusteringdata. Different ...
متن کاملNoise Thresholds for Spectral Clustering
Although spectral clustering has enjoyed considerable empirical success in machine learning, its theoretical properties are not yet fully developed. We analyze the performance of a spectral algorithm for hierarchical clustering and show that on a class of hierarchically structured similarity matrices, this algorithm can tolerate noise that grows with the number of data points while still perfec...
متن کامل